Overview

Dataset Statistics

Number of Variables 12
Number of Rows 1.9705e+06
Missing Cells 3.4504e+06
Missing Cells (%) 14.6%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 989.4 MB
Average Row Size in Memory 526.5 B
Variable Types
  • Numerical: 4
  • Categorical: 8

Dataset Insights

Result numeric has 247101 (12.54%) missing values Missing
Result textual has 1724172 (87.5%) missing values Missing
Result range has 373539 (18.96%) missing values Missing
Result units has 243879 (12.38%) missing values Missing
Specimen source has 861662 (43.73%) missing values Missing
Result numeric is skewed Skewed
Lab test date has a high cardinality: 288588 distinct values High Cardinality
Lab test has a high cardinality: 29706 distinct values High Cardinality
Lab test description has a high cardinality: 24071 distinct values High Cardinality
Result textual has a high cardinality: 2982 distinct values High Cardinality
Result range has a high cardinality: 8665 distinct values High Cardinality
Result units has a high cardinality: 1266 distinct values High Cardinality
Specimen source has a high cardinality: 71 distinct values High Cardinality
Lab test date has constant length 21 Constant Length
  • 1
  • 2

Variables


Unnamed: 0

numerical

Approximate Distinct Count 1970531
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 31528496
Mean 1.6393e+08
Minimum 10813
Maximum 328841156
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Unnamed: 0 is skewed right (γ1 = 0.0013)

Quantile Statistics

Minimum 10813
5-th Percentile 1.2977e+07
Q1 7.9328e+07
Median 1.6093e+08
Q3 2.4774e+08
95-th Percentile 3.1332e+08
Maximum 328841156
Range 328830343
IQR 1.6841e+08

Descriptive Statistics

Mean 1.6393e+08
Standard Deviation 9.7063e+07
Variance 9.4213e+15
Sum 3.2302e+14
Skewness 0.00131
Kurtosis -1.2241
Coefficient of Variation 0.5921

Internalpatientid

numerical

Approximate Distinct Count 983
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 31528496
Mean 82496.372
Minimum 67
Maximum 168899
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Internalpatientid is skewed right (γ1 = 0.0245)

Quantile Statistics

Minimum 67
5-th Percentile 8281
Q1 42933
Median 84413
Q3 124950
95-th Percentile 158264
Maximum 168899
Range 168832
IQR 82017

Descriptive Statistics

Mean 82496.372
Standard Deviation 48291.1942
Variance 2.332e+09
Sum 1.6256e+11
Skewness 0.02452
Kurtosis -1.1967
Coefficient of Variation 0.5854

Age at lab test

numerical

Approximate Distinct Count 1970465
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 31528496
Mean 70.304
Minimum 25.0523
Maximum 104.1697
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Age at lab test is skewed left (γ1 = -0.1569)

Quantile Statistics

Minimum 25.0523
5-th Percentile 51.563
Q1 62.2312
Median 70.7313
Q3 79.1164
95-th Percentile 89.077
Maximum 104.1697
Range 79.1174
IQR 16.8852

Descriptive Statistics

Mean 70.304
Standard Deviation 11.6253
Variance 135.1471
Sum 1.3854e+08
Skewness -0.1569
Kurtosis -0.2418
Coefficient of Variation 0.1654
  • Age at lab test is not normally distributed (p-value 2.714470484645224e-08)
  • Age at lab test has 6477 outliers

Lab test date

categorical

Approximate Distinct Count 288588
Approximate Unique (%) 14.6%
Missing 0
Missing (%) 0.0%
Memory Size 169465666

Length

Mean 21
Standard Deviation 0
Median 21
Minimum 21
Maximum 21

Sample

1st row 2002-08-01 21:31:4...
2nd row 2002-08-01 21:31:4...
3rd row 2002-08-01 21:31:4...
4th row 2002-08-01 21:31:4...
5th row 2002-08-01 21:31:4...

Letter

Count 0
Lowercase Letter 0
Space Separator 1970531
Uppercase Letter 0
Dash Punctuation 3941062
Decimal Number 29557965
  • Lab test date has words of constant length

Lab test

categorical

Approximate Distinct Count 29706
Approximate Unique (%) 1.5%
Missing 0
Missing (%) 0.0%
Memory Size 176944648

Length

Mean 24.7954
Standard Deviation 8.1623
Median 24
Minimum 10
Maximum 68

Sample

1st row potassium_mmol/l_p...
2nd row urea nitrogen_mg/d...
3rd row creatinine_mg/dl_p...
4th row magnesium_mg/dl_pl...
5th row albumin_g/dl_plasm...

Letter

Count 38286238
Lowercase Letter 38286134
Space Separator 2175646
Uppercase Letter 104
Dash Punctuation 418685
Decimal Number 998576
  • The largest value (specimen) is over 7.84 times larger than the second largest value (urine)

Lab test description

categorical

Approximate Distinct Count 24071
Approximate Unique (%) 1.2%
Missing 0
Missing (%) 0.0%
Memory Size 149362716

Length

Mean 10.7982
Standard Deviation 6.9598
Median 10
Minimum 1
Maximum 40

Sample

1st row POTASSIUM
2nd row UREA NITROGEN
3rd row CREATININE
4th row MAGNESIUM
5th row ALBUMIN

Letter

Count 17343378
Lowercase Letter 2296085
Space Separator 1291924
Uppercase Letter 15047293
Dash Punctuation 412322
Decimal Number 795379

Result numeric

numerical

Approximate Distinct Count 996811
Approximate Unique (%) 57.8%
Missing 247101
Missing (%) 12.5%
Infinite 0
Infinite (%) 0.0%
Memory Size 27574880
Mean 76704.64
Minimum -77.0886
Maximum 5.3566e+10
Zeros 34844
Zeros (%) 1.8%
Negatives 1500
Negatives (%) 0.1%
  • Result numeric is skewed right (γ1 = 883.5033)

Quantile Statistics

Minimum -77.0886
5-th Percentile 0.2071
Q1 3.8725
Median 15
Q3 71.1081
95-th Percentile 197
Maximum 5.3566e+10
Range 5.3566e+10
IQR 67.2356

Descriptive Statistics

Mean 76704.64
Standard Deviation 5.8469e+07
Variance 3.4187e+15
Sum 1.322e+11
Skewness 883.5033
Kurtosis 801151.884
Coefficient of Variation 762.268
  • Result numeric is not normally distributed (p-value 4.226514070334103e-25)
  • Result numeric has 105800 outliers

Result textual

categorical

Approximate Distinct Count 2982
Approximate Unique (%) 1.2%
Missing 1724172
Missing (%) 87.5%
Memory Size 17347610

Length

Mean 5.416
Standard Deviation 2.7938
Median 5
Minimum 1
Maximum 54

Sample

1st row 0 = Negative
2nd row 0 = Negative
3rd row 0 = Negative
4th row N
5th row N

Letter

Count 1177269
Lowercase Letter 435880
Space Separator 9991
Uppercase Letter 741389
Dash Punctuation 12032
Decimal Number 75390

Result range

categorical

Approximate Distinct Count 8665
Approximate Unique (%) 0.5%
Missing 373539
Missing (%) 19.0%
Memory Size 117077114
  • The largest value (70 - 110) is over 1.58 times larger than the second largest value (0 - 2)

Length

Mean 8.311
Standard Deviation 1.6685
Median 8
Minimum 5
Maximum 23

Sample

1st row 3.5 - 4.8
2nd row 7 - 20
3rd row .8 - 1.6
4th row 1.6 - 2.4
5th row 3.5 - 4.8

Letter

Count 0
Lowercase Letter 0
Space Separator 3193984
Uppercase Letter 0
Dash Punctuation 1599395
Decimal Number 7029153
  • The largest value (0) is over 1.89 times larger than the second largest value (70)

Result units

categorical

Approximate Distinct Count 1266
Approximate Unique (%) 0.1%
Missing 243879
Missing (%) 12.4%
Memory Size 119442797

Length

Mean 4.176
Standard Deviation 1.8591
Median 5
Minimum 1
Maximum 20

Sample

1st row mmol/L
2nd row mg/dL
3rd row mg/dL
4th row mg/dL
5th row g/dL

Letter

Count 5387599
Lowercase Letter 3588156
Space Separator 17021
Uppercase Letter 1799443
Dash Punctuation 5973
Decimal Number 203045
  • The largest value (mgdl) is over 2.73 times larger than the second largest value (mmoll)

Specimen source

categorical

Approximate Distinct Count 71
Approximate Unique (%) 0.0%
Missing 861662
Missing (%) 43.7%
Memory Size 78055432

Length

Mean 5.3919
Standard Deviation 0.7367
Median 5
Minimum 4
Maximum 32

Sample

1st row plasma
2nd row plasma
3rd row plasma
4th row plasma
5th row plasma

Letter

Count 5965676
Lowercase Letter 5965676
Space Separator 5113
Uppercase Letter 0
Dash Punctuation 390
Decimal Number 154

State

categorical

Approximate Distinct Count 50
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 144647590

Length

Mean 8.4054
Standard Deviation 2.6188
Median 8
Minimum 4
Maximum 20

Sample

1st row New Mexico
2nd row New Mexico
3rd row New Mexico
4th row New Mexico
5th row New Mexico

Letter

Count 16221539
Lowercase Letter 13921278
Space Separator 341536
Uppercase Letter 2300261
Dash Punctuation 0
Decimal Number 0

Interactions

Correlations

Missing Values